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Living cells often need to extract information from biochemical signals that are noisy. We study 
how accurately cells can measure chemical concentrations with signaling networks that are linear. 
For stationary signals of long duration, they can reach, but not beat, the Berg-Purcell limit, which 
relies on uniformly averaging in time the fluctuations in the input signal. For short times or non- 
stationary signals, however, they can beat the Berg-Purcell limit, by non-uniformly time-averaging 
the input. We derive the optimal weighting function for time averaging and use it to provide the 
fundamental limit of measuring chemical concentrations with linear signaling networks. 
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Cells measure concentrations of chemicals via receptors 
on their surface. These measurements, however, are in- 
evitably corrupted by noise that arises from the stochas- 
tic arrival of ligand molecules at the receptor by diffusion 
and from the stochastic binding of the ligand to the recep- 
tor. Biochemical networks that transmit the information 
on the ligand concentration from the surface of the cell 
to its interior often have to filter this noise extrinsic to 
the cell as much as possible. However, how the capacity 
of signaling networks to remove extrinsic noise depends 
on their design, and what the fundamental limits to this 
capacity are, remain open questions. 

Several studies have addressed the question how accu- 
rately the ligand concentration c can be estimated from 
the time trace of the number of ligand-bound receptors, 
S(t), over some integration time T [THE]. Berg and Pur- 
cell assumed that the estimate c with least error is the one 
that matches the observed time average of the stochas- 
tic signal S(t), S = 1/T S(t)dt, giving all the signal 
values equal weight in the average pQ. When S(t) is 
stationary, with mean (j,g, variance <j s , and correlations 
that decay exponentially over a time r c , the estimate c 
has variance (error) [3l El [10] : 
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More recently, Mora, Endres, and Wingreen showed that, 
when r » T C1 maximum likelihood estimation produces 
an estimate that is better by 50%, since the time-average 
includes noise from stochastic ligand unbinding, which 
provides no information about the ligand concentration 
01- 

While these previous studies have considered how 
much information about the ligand concentration is en- 
coded in receptor-occupancy time traces, they do not 
address the question how much information biochemi- 
cal networks can actually extract from these time traces. 
To extract all the information, the biochemical networks 
downstream of the receptors would need to construct a 



maximum likelihood estimate [SJ [5] . However, it is not 
clear that typical biochemical networks do this, nor is 
it clear that they time-average signals uniformly as in 
the Berg-Purcell estimate. Moreover, the previous anal- 
yses [IH5] assumed an integration time T, but what time 
scales in the processing network actually set the inte- 
gration time remains unclear. We therefore study how 
accurately biochemical networks can estimate the ligand 
concentration from receptor time traces. 

We focus on a simple but broad class of signaling net- 
works, linear networks Many networks respond lin- 
early over the range of fluctuations in their input (e.g. 
[H[T21[T3]) an d a systematic study can be done analyti- 
cally. Since the effects of noise intrinsic to the molecular 
interactions inside cells have been well studied [TJ IT4HI6] , 
we focus on networks in the deterministic limit. This en- 
ables us to understand the unique effects due to the noise 
in the input signal. 

Linear networks time-average the input signal, but 
do not generally give rise to uniformly weighted time- 
averages. We study how different signaling motifs sculpt 
the weighting of the signal as a function of time, and 
how this affects the precision of ligand sensing. While 
linear networks cannot extract all of the information in 
the input signal (i.e. the maximum likelihood estimate 
[5]), they can, surprisingly, reach the Berg-Purcell limit 
and even exceed it by 12%; this is because the optimal 
weighting is non-uniform, in contrast to the Berg-Purcell 
estimate. We show that a simple network based on a feed- 
forward loop, a common motif in biochemical networks 
|18j . can reach the bound for linear signaling networks, 
and we elucidate the combination of time scales that sets 
the effective integration time. We conclude by studying 
how reliably biochemical signaling networks can extract 
information from non-stationary signals. 

We consider a cell that responds after a finite time T Q 
to a change in its environment which happens at time 
t = 0. This time T a is the observation time, which, as we 
discuss below, provides an upper bound to the integration 
time. As before, the receptor time trace provides the 
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FIG. 1: Responding to noisy environments, (a) The envi- 
ronment changes instantaneously at time t — 0, and the num- 
ber of bound receptors, S(t), adjusts instantaneously. S(t) 
is stationary between time and the time T , when the cell 
responds. The signaling network is either in a steady state by 
time T , independent of the initial condition 

(b), or in a basal state at time t = (c). X(t) denotes the 
number of X molecules at time t. The solid and dashed lines 
in panel b represent different initial conditions. 

signal to the cell, S(t). To compare to previous results, 
we initially assume that the change in the environment, 
and therefore the ligand concentration, is instantaneous, 
and that the receptors immediately adjust. Moreover, we 
assume that the fluctuations in S(t) decay exponentially 
with correlation time r c jTHK^O]. We neglect stochasticity 
in the time T Q , and, as mentioned above, the intrinsic 
noise in the processing network. The capacity of the 
cell to respond is then limited by the information in the 
stationary input S(t) from time to T a (Fig. la). 

As a measure of how much information the cell can ex- 
tract, we determine how accurately the ligand concentra- 
tion can be estimated from the molecular output X of the 
processing network at the time T Q of the response, assum- 
ing the response is made instantaneously based on X(T a ). 
As illustrated below in examples, the output of a lin- 
ear signaling network is X(T ) = jf^ f(T D - t')S(t')dt', 
where the unnormalized weighting (response or transfer) 
function /(At = T — t') reflects how the processing net- 
work at time T Q weights the signal at an earlier time t' 
[21j . To compare to previous results, we assume that 
either: (1) f(T a — t) = for t < 0, which corresponds 
to a scenario where the response time r r of the network 
is shorter than T , or, cquivalently, the network reaches 
steady state by the time T (Fig. lb); or (2) S(t) = for 
t < 0, which corresponds to a scenario in which the cell 
is initially in a basal state (Fig. lc). In both cases, we 
then have X(T a ) = f(T a - t')S(t')dt'. When neither 
f(T a — t) nor S(t) are zero for t < 0, then previous states 
of the environment, corresponding to t < 0, influence the 
state of the signaling network at time T a . Such previous 
environmental states can be a source of additional noise 
in X{T a ), complicating inference of the current environ- 
mental state, as well as a source of information, helping 
inference, if environmental transitions are correlated. 

We start by considering a simple linear signaling net- 
work, a reversible one-level cascade, in which the output 



molecule X is directly activated by the receptor with 
rate constant fcf and can be degraded with rate con- 
stant fcb (Fig. [2^,). Then, deterministically, dX/dt — 
k[S — k\ } X. The response of this network at time T Q is 
x ( T o) = Jo° f(To - t)S(t)dt + g(T o )X(0) with /(At) = 
fc f exp(-fc b At) and g(T ) = exp(-fc b T ) (Fig. fy). We 
neglect the term g(T o )X(0) for the reasons mentioned 
above: either because T D is larger than the response time 
r r = 1/fcb in which case g(T Q ) « 0, or, because the ini- 
tial state is ligand-free and X(Q) sa 0. We note that 
the weighting function /(At) decays with increasing At, 
which means that more weight is placed on more recent 
values of the input signal. This is because the decay re- 
action is least likely to have degraded the most recently 
produced X molecules. 

We now address the question how the departure from 
uniform weighting affects the error in the estimate of the 
concentration. Following the derivation of Eq. [I] [T], 
an estimate of the ligand concentration from X(T Q ) has 
variance 

4[X(T )] = a% (To) / (dnx(T o) /dc) 2 , (2) 

where the mean A*x(T ) of X(T Q ) is a linear function of c 
over the range of fluctuations in X(T a ). Using X(T Q ) = 
So" f(Ta — t')S(tf)dt', we then arrive at (see supplement) 

a 2 £ [X(T )]=a 2 £ [S] 

x I ° / V(To-ti)C(fi,t2)/Cr -t 2 )dtidi2. (3) 
Jo Jo 

Here, erf [£] = <r|/ (dfis/dc) 2 is the error of an estimate 
based on an instantaneous observation of the signal S(t). 
The reduction in error, resulting from averaging the fluc- 
tuations in the input signal over time, depends on the 
normalized correlation function of the input fluctuations, 
C(ti 7 t 2 ) = exp(— \t 2 — H\/t c ), and on the normalized 
weighting function, /(At) = /(At)/ f{At')dAt'. 

Fig. [2]d shows that the one-level reversible cascade ex- 
tracts less information from the input signal than a net- 
work that averages the input uniformly over time. Only 
when kb goes to zero, and /(At) oc exp(— k^At) ps 1, does 
the network, which now becomes an irreversible one-level 
cascade, implement uniform time averaging and does it 
extract the same amount of information. Intuitively, de- 
grading X destroys information. While degradation is 
required to make a signaling network responsive to new 
environments, this example shows that it may be useful 
to make degradation as weak as possible or to physically 
separate the receptors and deactivating enzymes (e.g. in 
different domains on the membrane [12]), such that X is 
deactivated only after the response has been made. 

Signaling networks typically consist of more than one 
layer, which makes it possible to sculpt the weighting 
function /(At). As an illustration, we first consider 
an irreversible cascade consisting of N layers/species: 
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FIG. 2: Extracting information from noisy input signals with linear signaling networks. (a,c,e,g) The weighting functions 
corresponding to different signaling networks are not uniform. (b,d,f,h) The ability of a signaling network to measure ligand 
concentration depends on its weighting function. The typical error (variance) in the estimate of ligand concentration is plotted 
as a percentage increase over the error of an estimate based on uniform weighting, assumed in the Berg-Purcell limit (Eq. 1 
with T = T ). (a) Reversible, one-level cascades selectively amplify late (t = T ) values of the signal, (b) leading to worse 
performance than the uniform average, (c) Irreversible, JV-level cascades amplify early (t = 0) values of the signal, (d) leading 
to worse performance than the uniform average, (e) The optimal weighting function, given by Eq. |4j averages the signal, 
selectively amplifying less correlated values. The delta functions are truncated for illustration, (f) The optimal weighting 
function outperforms the uniform average, (g) A signaling network consisting of two branches, which selectively amplify late 
(t — T ) (left branch) and early (t — 0) (right branch) values of the signal, approximates the optimal weighting function 
(fci = 4.4fc3fc4T ; &2 = 20/T o ; &4 = 0.35/C3; / independent of k3,k 5 ; k& ^> To )• (h) The network in (f) can outperform the 
uniform average. 



dli/dt — kfili-t, where i = 1,...,N and I — S. As- 
suming X(0) « 0, X(T ) = J To /(T - t)S(t)dt, where 
the weighting function now behaves as /(At) oc At N ~ x . 
Such cascades place more weight on early values of the 
input signal, which have had more time to propagate 
through the network (Fig. [2b). As a result, they under- 
utilize (down-weight) the most recent information in the 
signal, and indeed, these cascades perform worse than a 
strict average of the input (Fig. [2ji). 

The above formalism can be generalized to arbitrarily 
large linear signaling networks. Multilevel reversible cas- 
cades have weighting functions that peak some finite time 
in the past, balancing the down- weighting of the signal 
from the distant past due to the reverse reactions, with 
the down-weighting of the signal from the recent past 
resulting from the multi-level character of the network 
(see supplement). More generally, linear combinations of 
the weighting functions for reversible and irreversible cas- 
cades can be achieved with multiple cascades that are ac- 
tivated by the input in parallel and which independently 
activate the same effector molecule, as we demonstrate 
below. Clearly, signaling networks allow for very diverse 
weighting functions. 

This raises the question whether there exists an opti- 
mal weighting function /*(At) that minimizes the error 
in the estimate of the ligand concentration. To this end, 
we differentiate Eq. [3] with respect to / using Lagrange 



multipliers that constrain the integral of / to 1, to find 
the optimal (normalized) weighting function: 

rt*)-(l-W)l + w " A '> + f'- r -> . (4) 

The first term places equal weight on all prior values of 
the input, as assumed in previous studies [THUG]. The 
second term, however, places greater weight on the first 
and last observed values of the signal, which are the two 
signal values that are the least correlated. Indeed, this is 
the central result of this manuscript: the optimal weight- 
ing function does not correspond to uniform weighting of 
all signal values. How much weight is placed on the first 
and last points is determined by w* — T ^ T , which 
decreases from one to zero as the response time T Q over 
the correlation time r c increases. 

The optimal weighting function can be implemented 
using common network motifs. For example, the com- 
monly observed feed-forward loop [18] in Fig. [2tj contains 
two branches which independently activate X. The left 
branch, a one-level reversible cascade, amplifies later val- 
ues of the signal (t — > T ); the right branch, a multilevel 
irreversible cascade, amplifies earlier (t — > 0) values of 
the signal. Together, they produce a weighting function 
which selectively amplifies less correlated values of the 
input (Fig. [2] g, h), outperforming the uniform average 
that could be obtained by reading out node I2 directly. 
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This simple network illustrates how a spectrum of pro- 
tein lifetimes and cascade levels can be used to shape 
weighting functions. 

The optimal weighting function /* also provides the 
fundamental limit on the ability of linear signaling net- 
works to measure chemical concentrations: 

which is obtained by combining Eqs. [3]and|4j Eq. [5]has a 
simple interpretation: a time series of length T Q contains 
an independent observation every time period of the or- 
der of the correlation time, plus one corresponding to the 
observation at t — 0. Eq. [5] is then the formula for the 
variance of the mean of N — T Q /(2r c ) + 1 independent, 
identically distributed random variables. 

The improvement of the optimal weighting function 
over uniform weighting (Eq. [T]) is maximal when the 
observation time is about three correlation times. The 
maximum improvement over the sample average is 12% 
(Fig. [2^) . While this improvement over the Berg-Purcell 
estimate is modest, and smaller than the 50% improve- 
ment that could in principle be obtained by maximum 
likelihood [5; 8 , it does show, for the first time, that sim- 
ple signaling networks can indeed reach the Berg-Purcell 
limit, and even exceed it. 

Equally important, our analysis provides a clear per- 
spective on the integration time. Clearly, T a , the time on 
which the cell must respond, provides an upper bound on 
the integration time. Yet, the processing network weights 
the input signal by f(T a — t), which may become zero 
for t < T a . In this case, the effective integration time 
T e g is limited by the range over which f(T a — t) is non- 
zero. For example, the weighting function of the one- 
level reversible cascade becomes zero on the time scale 
fc^ 1 = txi the lifetime of the output component. This 
can be (much) smaller than T G , in which case T e g is lim- 
ited by t X - Toff ~ T x < T Q . Essentially, degradation 
of the output erases memory of the input. However, 
our study of multi-level reversible cascades shows that 
in general the range over which /(Ai) is non-zero can 
be longer than the lifetime of the individual components. 
Additional intermediate layers not only change the form 
of f(Ai), but also extend the range over which it is non- 
zero, increasing the integration time over which the out- 
put remembers past signals (see supplement). 

Values for the correlation time r c of the input signal 
and the observation time T Q vary widely across biological 
systems. Ligand-receptor half-lives, a key determinant of 
r c , vary at least over more than an order of magnitude, 
i.e. from milliseconds to an hour [HI [23]. The cell-cycle 
time provides an upper bound on T a [24] (e.g. 45 minutes 
in E. coli [24] and 100 minutes in yeast [25]), but signal- 
ing modules and transcriptional responses can make deci- 
sions sooner. Indeed, T Q is not always significantly larger 
than t c , so that the regime in which linear networks can 



beat the Berg-Purcell estimate is biologically relevant. 
For example, both the MAPK response to EGF stimula- 
tion [261 E7] and the NF-kB response to TNF stimulation 
[2"5] peak on the time scale of ligand-receptor debinding 
(10 minutes [23] and 30 minutes [25], respectively). Ad- 
ditionally, correlation times for gene expression are of the 
order of the cell cycle time in both E. coli and human 
cells (24] [30] , suggesting the finite T a limit is also impor- 
tant for scenarios in which intracellular proteins act as 
receptors for intracellular signals [2]. 

Interestingly, when T Q < t c , the equilibration time of 
the signal must be taken into account, since the equili- 
bration time is, according to the fluctuation-dissipation 
theorem, given by the correlation time, at least when 
the change in c is small. Therefore, we end by study- 
ing how signaling networks can extract information from 
non-stationary signals. We study an input signal gen- 
erated by 5 with 5(0) = and forward and re- 
verse rates k p c and fc r 5, respectively. This signal in- 
creases to its steady state value on a time scale t = l/fc r , 
which also equals the steady-state correlation time r c . 
Extending the procedure in Eqs. [3] and [IJ the mini- 
mal estimation error with a linear signaling network is 

^U X *( T o)} = TM+ $2 M)/2 ( see supplement). 
This shows that less information can be extracted from 
non-stationary signals than from stationary ones. To 
avoid the detrimental effect of correlations, the optimal 
weighting function places more weight on the initial and 
final points, as for stationary signals. However, because 
there is no information at t — 0, the amplification of 
early time points is spread over time points t < t c (Fig. 
S2). Additionally, the relative amplification of the last 
time point increases with decreasing T . Indeed, when 
T <C r c , no previous signal values are sufficiently uncor- 
related with the most recent one, and almost all weight 
is placed on the final time point 5(T Q ). 

Wc have studied the ability of linear signaling networks 
to extract information from noisy input signals. While 
the data processing inequality suggests that it is advan- 
tageous to limit the number of nodes in a signaling net- 
work to minimize the effect of intrinsic noise |14j , here we 
show that there can be a competing effect, in terms of in- 
formation processing, in favor of increasing the number 
of nodes: better removal of extrinsic noise. Additional 
nodes make it possible to sculpt the weighting function 
for averaging the incoming signal, allowing signaling net- 
works to reach and even exceed the Berg-Purcell limit. 
Our predictions could be tested experimentally in a con- 
trolled setting by using in vitro or in vivo synthetic sig- 
naling networks [31] . Dual reporter constructs can be 
used to isolate the effects of extrinsic noise, studied in 
this Letter, from noise intrinsic to the signaling machin- 
ery itself [31 [33]. 

This work is part of the research program of 
the "Stichting voor Fundamenteel Onderzoek der 
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Estimation error from linear or linearized signaling 
networks 



We provide further details into the derivation of Eq. 
3 in the main text for linearized signaling networks. 
The proof for linear networks, sketched in the main 
text, follows as a special case. For linearized signal- 
ing networks, 5X(T ) = J Q T ° f c (T a - t)6S(t)dt, where 
6S(t) = S(t) - E[S{t)\c], 8X{t) = X(t) - E[X{t)\c], and 
/ can depend on c (because the linearization depends on 
the trajectory the network is linearized about.) We use 
E[Y] to denote the expectation of the random variable 
Y. The dependence of / on c makes the proof of Eq. 3 for 
linearized networks more subtle than for linear networks. 
We start from Eq. 2 in the main text: 

4[X(T )} = a 2 x{To) / (dE[X(T )\c]/dcf 

The variance in the numerator of Eq. [6] is: 



(6) 



a 



X(To) 



= E 



E[(SX(T Q )Y 



f c (T - t)SS(t)dtj 




f c (T - ti)-E[JS(ti)JS(t 2 )]/c(T - t 2 )dt 1 dt 2 

io Jo 

fc(T - h)C{t x ,t 2 )fc{T - t 2 )dt x dt 2 (7) 

10 JO 

To determine the denominator of Eq. [6j 
note that X{T ) = E[X(T )\c] + / C (T Q - 
t) (S(t) - E [S(t) \c\) dt. Taking the expectation at 
a concentration c + dc yields: E[X(T )\c + dc] — 
E[X(T )\c}+ J To f c (T -t)(E[S(t)\c + dc]- E[S(t)\c])dt. 
Then, because S is stationary: 

dE[X(T )\c] dE[S\c] rT ° 



dc 



dc 



f c (T a - t)dt (8) 



Inserting Eqs. [7] and [8] into the numerator and denomi- 
nator, respectively, of Eq. [6j we find Eq. 3 in the main 
text: 

a 2 £ [X(T )]^aj[S] 

x / ° / ° f(To-t 1 )C(t 1 ,t 2 )KT -t 2 )dt 1 dt2. (9) 
Jo Jo 

The integrals of the weighting function in the denomi- 
nator of Eq. [6] normalize the weighting functions in the 
numerator of Eq. [6j the correlation function is normal- 
ized by pulling the stationary variance of the signal S 
into the pref actor. 
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Multilevel reversible cascades 

We consider a reversible cascade consisting of N lay- 
ers/species that are each degraded at the same rate, kh- 
dljdt = k&Ii-i — k^Ii, where i — 1,...,N, Iq = S, 
and X = 7 N . Assuming 7^0) w 0, X(T ) = £° f{T - 
i)S(t)dt, as for the examples in the main text. For N = 1, 
the network is the one-level reversible cascade studied 
in the main text, with /(At) oc exp(— k^At). The one- 
level reversible cascade places the most weight on the 
most recent value of the signal (At* = 0). The weight- 
ing function for general N, which can be determined 
by Laplace transforming the governing differential equa- 
tions, behaves as f(At) = ^d^At™- 1 exp(~k h At) (Fig. 
[3]). The exponential factor, which reflects the reversibil- 
ity of the cascade, emphasizes the most recent values 
of the signal; the polynomial factor, which reflects the 
number of levels, emphasizes older values of the signal. 
In combination, the two factors lead to nonmonotonic 
weighting functions that peak some finite time in the 
past, At* = (Fig. jSJ). As a result, additional levels 
in a cascade can increase the effective integration time 
over which the weighting function is nonzero. 

By remembering farther into the past, multilevel re- 
versible cascades can improve the performance of signal- 
ing networks, provided T Q is not limiting. Using Eq. 3 
in the main text, the estimation error for a one-level re- 
versible cascade, N = 1, is: 



1 

k b T c 



1 



(10) 



in the limit T Q 3> l/fcft. This equation is similar to the 
extrinsic component of the noise-addition rule, after rear- 
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distant past 



FIG. 3: Multilevel reversible cascades extend the integration 
time over which the output remembers past signals. A cascade 
with N levels places maximum weight on past time points 
k^At* = n — 1 = 0, 1,2 for cascades with 1 (blue), 2 (green), 
or 3 (red) levels, respectively. The plotted weighting functions 
have been normalized to integrate to 1. 



rangement of terms Q3I2]- The solution for general N can 
be obtained in terms of the hypergeometric function but 
is difficult to analyze. For a two-level (N = 2) cascade 
the estimation error is: 



oi[X{T a )] 



2fc b T c 



1 



1 

k b T c 



1 



(11) 



which is smaller than the error for the one-level cascade 
for all finite values of k^T c . The improvement is greatest 
in the limit of slow-decaying molecules, k^r c — » 0; then, 
the error of a two-level cascade is half that of a one-level 
cascade. More generally, by combining different multi- 
level cascades that peak at different times in the past, 
networks can both shape weighting functions and extend 
the range over which they are nonzero, even when the 
lifetimes of signaling molecules are limited. 



Non-stationary signals 

We consider the non-stationary signal S(t) introduced 
in the main text, generated by ±=> S with dpLs/dt — 
k p c — kxUs and S'(O) = 0. The correlation time is 
t c = l/fc r . Because the signal is nonstationary, the 
weighting function f(t; T Q ) no longer reduces to f(T Q —t). 
In what follows we write f(t), where the argument is the 
time directly (i.e. the time since the change in environ- 
ment) and not At, as for the stationary input signal. The 
response of a signaling network with weighting function 
/(f) is X(T ) = J To f(t)S(t)dt. As for a stationary sig- 
nal, the variance of an estimate based on X(T Q ) is given 
by Eq. [6j when X(T ) is linear over the fluctuations in 
S(t). Note that Ms(t) = fc p cr c (l — exp(— t/r c )) so that 
dE[X(T )\c]/dc in the denominator of Eq. [6] is: 



dc 



T ° m ±m dt 



kpT c 



dc 



/(t)(l-exp(-t/r c ))dt (12) 



The numerator of Eq. [6] is: 

°X(T ) = E ( X ( T o) - MX(T ))' 




f(t) (S(t)- m) )j 
/(ti)C(t 1 ,t 2 )/(f 2 )dtirff2 (13) 



as for a stationary signal, except that the correlation 
function for the non-stationary signal is: 



fop CT C 



-\t 2 -t\\/T c 



■ max(ii ,£2) /t c 



(14) 
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For this non-stationary case, we define a normalized 
weighting function /, which differs from that for the sta- 
tionary case presented in the main text: 

/(*) 



with 



/(*) 



Jo" /(*)(! -exp(-tf/T c ))df 



(15) 



so that J Q ° f(t)(l — exp(—t/T c ))dt — 1. Additionally, we 
define C to be the correlation function normalized by the 
signal's variance C(t\ = t%) in steady state, k p CT c . 

Combining Eqs. |6j[T2][T3j and [IB"! the analogue of Eq. 
3 in the main text for this non-stationary signal is: 

al[X(T )] =-?-["[ /(*i)G(*i.*a)/(*2)dti<fta 

KpT c Jo JO 

(16) 

The prefactor can be interpreted as the error of an es- 
timate of c based only on an instantaneous observation 
of the signal in steady state (i.e. T Q 3> r c ): <7g[S] = 
cr|/ (dfis/dc) = k p cT c /(k p T c ) 2 = c/(fc p r c ). 

To minimize Eq. 16 we differentiate with respect to /, 



using a Lagrange multiplier to enforce the normalization 
constraint: 

C(h, t 2 )f{t 2 )dt2 - A(l - expHx/O) = 0. (17) 

One way to solve Eq. [17] is to differentiate three times 
with respect to t±, resulting in an ordinary differential 
equation after substitution of intermediate derived equal- 
ities (see ref. [3] for a discussion of this method for solving 
integral equations). The solution is (Fig. [ij: 

f*(t) = Cl - L^+c 2 <5(i-T ) (18) 

(2 - e- 1 /^) 



if 0.5 




FIG. 4: Optimal weighting functions for a non-stationary 
signal. The optimal weighting function is plotted for T a = 1 
for a signal with r c = 0.1 (green), 1 (red), and 10 (blue). For 
comparison, the weighting functions have been renormalized 
to integrate to 1; i.e. we have multiplied by a factor 7 so that 
7 fdt — 1. The delta functions at time T are truncated 
for illustration, with height equal to their respective coeffi- 
cients. Some minimal weight is placed on all points; how 
much depends on r c and T . Then additional weight is placed 
on early and late data points, because of correlations in the 
input signal. The final time point dominates the estimate 
when T <C r c (blue curve). 



Cl 
C-2 



T 

1 o 

Cl 



t c In (2 — e 

T c 

2 (2 - e-^o/r.) 



T„/t c 



(19) 
(20) 



Note that the weighting function has units of 1/time, C\ 
has units of 1/time, ci has no units, and the delta func- 
tion has units of 1/time. The weight placed on the final 
time point grows relative to the weight placed on other 
points as T q /t c decreases, as measured by its contribu- 
tion to the integral of the weighting function. The first 
term approaches a constant weight for t > t c . 

The corresponding minimal estimation error is, as in 
the main text: 



4[X*{T )] = 



*m 

T /(2T c )+ln(2- e -TVr c )/2 



(21) 



The short and long time limits of Eq. [21] can be mo- 
tivated with simple arguments. For short observation 
times T -C r c , the estimate is essentially constructed 



from S(T a ) only, since Eq. 18 indicates that all weight 
is placed on the final time point in that limit. Because 
no S molecules decay on the short time T Q <C r c , the 
number of S molecules at time T is Poisson distributed 
with arrival rate fe p c, mean k p cT Q , and variance k p cT Q . 
The variance of an estimate based on S(T ) is then, from 
Eq. [6j (fc p cT )/(fcpT ) 2 = c/(fc p T ), the short time ap- 
proximation of Eq. |21| The long time approximation 
of Eq. [2l] is cr|[S , ]/(T /(2r c )), identical to the long-time 
approximation for an estimate based on a stationary sig- 
nal of equivalent duration (see Eq. 5 in the main text). 
The effect of the non-stationarity is washed out on long 
time scales. For finite times, the non-stationary signal 
contains less extractable information than a stationary 
signal of equal duration. 
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